AITopics | task-agnostic exploration

Collaborating Authors

task-agnostic exploration

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Principled Multi-Agent Task Agnostic Exploration

Zamboni, Riccardo, Mutti, Mirco, Restelli, Marcello

arXiv.org Artificial IntelligenceFeb-12-2025

In reinforcement learning, we typically refer to task-agnostic exploration when we aim to explore the environment without access to the task specification a priori. In a single-agent setting the problem has been extensively studied and mostly understood. A popular approach cast the task-agnostic objective as maximizing the entropy of the state distribution induced by the agent's policy, from which principles and methods follows. In contrast, little is known about task-agnostic exploration in multi-agent settings, which are ubiquitous in the real world. How should different agents explore in the presence of others? In this paper, we address this question through a generalization to multiple agents of the problem of maximizing the state distribution entropy. First, we investigate alternative formulations, highlighting respective positives and negatives. Then, we present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings. Finally, we provide proof of concept experiments to both corroborate the theoretical findings and pave the way for task-agnostic exploration in challenging multi-agent settings.

artificial intelligence, machine learning, objective, (15 more...)

arXiv.org Artificial Intelligence

2502.08365

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > Italy > Lombardy > Milan (0.04)
(2 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Review for NeurIPS paper: Task-agnostic Exploration in Reinforcement Learning

Neural Information Processing SystemsFeb-11-2025, 22:56:37 GMT

There are quite a few existing exploration solutions to visit all the states often. But these works were not compared or discussed. Concretely I gave 3 examples. While reading the authors rebuttal I understand why two of them are less relevant to their specific setup. There are, however, many more works which I did not provide in my review and are still relevant.

neurips paper, reinforcement learning, task-agnostic exploration, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Task-agnostic Exploration in Reinforcement Learning

Neural Information Processing SystemsFeb-11-2025, 22:56:29 GMT

This is a good paper, that requires some minor tweaks to be camera ready. All 4 reviewers supported acceptance, two knowledgeable reviewers strongly supported acceptance. R3 was the strongest critic but agreed the author response addressed their major concerns. This is a good theory paper that is well written, precise & accurate, the results provide new insights and generalize to other problem settings. The reviewers had some concern over the framing of the contributions, in particular the novelty and utility of the proposed algorithm.

neurips paper, reinforcement learning, task-agnostic exploration, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Task-agnostic Exploration in Reinforcement Learning

Neural Information Processing SystemsJan-26-2025, 09:02:21 GMT

Efficient exploration is one of the main challenges in reinforcement learning (RL). Most existing sample-efficient algorithms assume the existence of a single reward function during exploration. In many practical scenarios, however, there is not a single underlying reward function to guide the exploration, for instance, when an agent needs to learn many skills simultaneously, or multiple conflicting objectives need to be balanced. To address these challenges, we propose the \textit{task-agnostic RL} framework: In the exploration phase, the agent first collects trajectories by exploring the MDP without the guidance of a reward function. After exploration, it aims at finding near-optimal policies for N tasks, given the collected trajectories augmented with \textit{sampled rewards} for each task.

reinforcement learning, reward function, task-agnostic exploration, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.79)

Add feedback

Task-Agnostic Learning to Accomplish New Tasks

Zhang, Xianqi, Wang, Xingtao, Liu, Xu, Wang, Wenrui, Fan, Xiaopeng, Zhao, Debin

arXiv.org Artificial IntelligenceFeb-16-2023

Reinforcement Learning (RL) and Imitation Learning (IL) have made great progress in robotic control in recent years. However, these methods show obvious deterioration for new tasks that need to be completed through new combinations of actions. RL methods heavily rely on reward functions that cannot generalize well for new tasks, while IL methods are limited by expert demonstrations which do not cover new tasks. In contrast, humans can easily complete these tasks with the fragmented knowledge learned from task-agnostic experience. Inspired by this observation, this paper proposes a task-agnostic learning method (TAL for short) that can learn fragmented knowledge from task-agnostic data to accomplish new tasks. TAL consists of four stages. First, the task-agnostic exploration is performed to collect data from interactions with the environment. The collected data is organized via a knowledge graph. Compared with the previous sequential structure, the knowledge graph representation is more compact and fits better for environment exploration. Second, an action feature extractor is proposed and trained using the collected knowledge graph data for task-agnostic fragmented knowledge learning. Third, a candidate action generator is designed, which applies the action feature extractor on a new task to generate multiple candidate action sets. Finally, an action proposal is designed to produce the probabilities for actions in a new task according to the environmental information. The probabilities are then used to select actions to be executed from multiple candidate action sets to form the plan. Experiments on a virtual indoor scene show that the proposed method outperforms the state-of-the-art offline RL method: CQL by 35.28% and the IL method: BC by 22.22%.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2209.041

Country: Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Policy Gradient Method for Task-Agnostic Exploration

Mutti, Mirco, Pratissoli, Lorenzo, Restelli, Marcello

arXiv.org Machine LearningJul-9-2020

In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy? In this paper, we argue that the entropy of the state distribution induced by limited-horizon trajectories is a sensible target. Especially, we present a novel and practical policy-search algorithm, Maximum Entropy POLicy optimization (MEPOL), to learn a policy that maximizes a non-parametric, $k$-nearest neighbors estimate of the state distribution entropy. In contrast to known methods, MEPOL is completely model-free as it requires neither to estimate the state distribution of any policy nor to model transition dynamics. Then, we empirically show that MEPOL allows learning a maximum-entropy exploration policy in high-dimensional, continuous-control domains, and how this policy facilitates learning a variety of meaningful reward-based tasks downstream.

artificial intelligence, machine learning, task-agnostic exploration, (15 more...)

arXiv.org Machine Learning

2007.0464

Country:

Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
Europe > Italy > Lombardy > Milan (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Evaluating task-agnostic exploration for fixed-batch learning of arbitrary future tasks

Dasagi, Vibhavari, Lee, Robert, Bruce, Jake, Leitner, Jürgen

arXiv.org Machine LearningNov-19-2019

Deep reinforcement learning has been shown to solve challenging tasks where large amounts of training experience is available, usually obtained online while learning the task. Robotics is a significant potential application domain for many of these algorithms, but generating robot experience in the real world is expensive, especially when each task requires a lengthy online training procedure. Off-policy algorithms can in principle learn arbitrary tasks from a diverse enough fixed dataset. In this work, we evaluate popular exploration methods by generating robotics datasets for the purpose of learning to solve tasks completely offline without any further interaction in the real world. We present results on three popular continuous control tasks in simulation, as well as continuous control of a high-dimensional real robot arm. Code documenting all algorithms, experiments, and hyper-parameters is available at https://github.com/qutrobotlearning/batchlearning.

algorithm, exploration, learning, (15 more...)

arXiv.org Machine Learning

1911.08666

Country:

Oceania > Australia > Queensland > Brisbane (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback